Collective evolution learning model for vision-based collective motion with collision avoidance

Collective motion (CM) takes many forms in nature; schools of fish, flocks of birds, and swarms of locusts to name a few. Commonly, during CM the individuals of the group avoid collisions. These CM and collision avoidance (CA) behaviors are based on input from the environment such as smell, air pressure, and vision, all of which are processed by the individual and defined action. In this work, a novel vision-based CM with CA model (i.e., VCMCA) simulating the collective evolution learning process is proposed. In this setting, a learning agent obtains a visual signal about its environment, and throughout trial-and-error over multiple attempts, the individual learns to perform a local CM with CA which emerges into a global CM with CA dynamics. The proposed algorithm was evaluated in the case of locusts’ swarms, showing the evolution of these behaviors in a swarm from the learning process of the individual in the swarm. Thus, this work proposes a biologically-inspired learning process to obtain multi-agent multi-objective dynamics.


Introduction
Collective motion (CM) is a phenomenon occurring in homogeneous populations of interacting individuals [1][2][3][4]. It is manifested in multiple forms in nature such as locusts aggregate into groups [5], birds flocking in an organized manner [6,7], and fish schools responding to predators and showing migratory movements [8,9]. During the last few decades, researchers seek to elucidate the mechanisms allowing animals to achieve synchronized motion while lacking centralized control. Empirical data from experimental observations [10,11] is analyzed with the aim to uncover individual's behavior using mathematical and computational models [12][13][14].
A large body of work can be categorized as agent-based models, where the collective dynamics are explained by the action on the individual-level [15]. One of the leading approaches in agent-based simulation for CM is Self-Propelled-Particles, popularized by the model proposed by Vicsek et al. [16], where agents align their velocity according to shortrange interaction with neighboring agents, commonly known as the metric-based approach. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 accurately aware of the location and velocities of the neighboring agents. Into the bargain, Young and La [47] presented a hybrid system that achieves an efficient CM behavior for escaping attacking predators while maintaining a flocking formation using multiple reinforcement learning methods. Additionally, it was shown that independent learning with and without function approximation proved to be unreliable in learning to flock together towards the same target. Similarly, in our work, it is assumed the focal agent is accurately aware of the location and velocities of the neighboring agents while solving a multi-objective task. Nonetheless, they were not able to obtain a collective behavior by learning a policy at the individual level as obtained by Durve et al. [46]. Lopez-Incera et al. [48] studied the collective behavior of artificial learning agents driven by reinforcement learning action-making process and with abstract sensing mechanism, that arises as they attempt to survive in foraging environments. They design multiple foraging scenarios in one-dimensional worlds in which the resources are either near or far from the region where agents are initialized and show the emergence of CM [48]. While their work took into consideration a sensing-based approach, they investigated a single objective behavior in an extremely simplistic scenario.
The novelty of the proposed work lies in the development of a Collective Evolution Learning (CEL) algorithm that allows learning a multi-objective collective behavior based on the optimization of a single member's decision-making process even with noisy input data. In particular, we show that CEL is able to achieve flocking with inter-agent collision avoidance using only vision-based inputs. Formally, our model produces vision-based collective motion (VCM) group behavior combined with collision avoidance (CA) or VCMCA for short. The model is obtained by simulating the evolution process of an animal population that learns the desired behavior over time using a trial-and-error mechanism. We evaluate the performance of the algorithm to obtain a VCMCA model in different settings, showing robust results. The remaining of the manuscript's structure is shown in Fig 1.

Vision-based collective motion with collision avoidance
In this section, we describe the proposed VCMCA model, constructed from three components: the vision-based input mechanism on the agent level, the movement and collision physical mechanism, and the swarm-level CM and CA dynamics. A table that summarizes the parameters used in the model, their definition, and notations are provided in the Appendix.

Settings and vision-based input
In order to represent the open-field conditions that characterize swarming organisms' environments in a simple manner, we define a two-dimensional world with two-dimensional agents, in a similar fashion to Vicsek's model [23]. We use the metric-based approach to define the neighbors of an agent.
The model (M) is defined by the tuple M ≔ (E, P) where E � R 2 is the environment and P is a fixed sized (kPk ≔ N) population of rectangular-shaped agents populating it. Formally, the environment, E, is a continuous space with height H 2 N and width W 2 N with a toroidal geometrical configuration. Namely, a two-dimensional rectangle with periodic boundary conditions. Each agent, p 2 P, is defined by a tuple p ≔ (l, v, a, h, w, r, γ) where l = (l x , l y ) 2 E is the location of the agent's center of mass, v 2 R 2 is the current velocity vector of the agent, a 2 R 2 is the current acceleration vector of the agent, h, w are the agent's height and width, respectively, and r is the metric-based sensing radius of the agent.
The parameters used for the agent's visual mechanism are inspired by previous works dealing with vision-based information acquisition of flocking animals [43,49]. Formally, the vision-based input is parameterized by g≔ðc; _ c; y; _ yÞ, where θ and ψ are the angular position and area subtended on the retina of the focal agent, respectively, and _ y; _ c are their respective derivatives in time.
Based on the sensing radius, r, we define the local environment of a focal agent p o 2 P (we use o to denote parameters related to the focal agent) to be the subset of agents P* that are located in a sensing distance from the focal agent P* ≔ {p i 2 P: kl i − l o k < r o } such that i 6 ¼ o. Now, each agent in the local environment (p i 2 P*) of a focal agent p o is sensed using the vision mechanism if it is not fully obscured by other agents. The vision-based properties (γ) are computed for each neighbor n in the local environment of the focal agent, resulted in a vector n 2 R jgj�jP * j ; such that ν ≔ [γ 1 , . . ., γ |P*| ]. The subtended angular area ψ for each neighbor n is obtained as follows: where w i n are the relative positions of the corners of n-th neighbor as seen in the focal agent frame. Formally, the lab frame coordinates of the corners of the nth neighbor: such that index i 2 {1, 2, 3, 4}, v ? n is the perpendicular vector to the velocity vector v of the neighbor agent, and l n is the position of the nth neighbor's center of mass. The coordinates in the focal agent (o) frame would result in w i n ¼ r i n À l o . Thus, the angular position parameter θ of the nth neighbor is defined as follows: An agent can be fully latent to the focal agent by other agents in its local environment. In such a case, it is not taken into consideration in the vision-based computation. A schematic view of the focal's agent vision-based sensing (γ) is shown in Fig 2. More details about the computation of Eqs (2) and (3) are provided in the Appendix.

Agent's movement and collision dynamics
The proposed model has a discrete, synchronized clock that all the agents, P, follow. At the beginning of the simulation (t 0 ), the locations and velocities of the agents' population (P) are chosen at random from a pre-defined distribution. Then, for each clock tick (marked by t i for the i th tick) the following happens: first, each agent p o 2 P senses its local environment and computes its action xðtÞ 2 R 2 based on a given decision-making model, in a random order. Second, each agent moves as follows: where d 2 R þ is a weighted variable indicates the influence of the desired direction (x(t i )) on the current direction. Due to the toroidal geometrical configuration of the environment, if l = 2 E then l new x ¼ modðl x ; WÞ and l new y ¼ modðl y ; HÞ. In case two agents p 1 2 P, p 2 2 P collide, a collision protocol is employed: first, the simulation identifies the both colliding agents to be either active or passive colliders. Namely, in a head-tail collision, the head colliding agent is regarded as the active and the tail colliding as the passive. In the case of a frontal collision, both parties are regarded as active colliders. The active colliding agent(s) are stopped (i.e., v(t i ) = 0) while the passive colliding agent's velocity follows purely-elastic collision: where v colliding (t i ) is the velocity of the colliding agent at the time of the collision (t i ). A schematic view of the perception-action-making loop of a single agent is presented in Fig 2b.

Collective motion and collision avoidance
We define the agents' population CM metric by the degree of alignment of the velocity headings of the simulated agents, in a similar fashion to other CM models' metrics [4,16]. Formally, the CM metric is a function F : P ! R where P is the space of all possible agent populations. F(P) is defined as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P p2P ðv À � vÞ where � v ≔ S p2p ðvÞ=jPj. In addition, we define a collision between two agents p 1 2 P and p 2 2 P taking place if and only if: The motivation behind this definition is to check if two agents p1, p2 would collide after they both make a move. In order to evaluate if a collision happened, we check if the distance between the agents' centers of masses (l 1 + v 1 and l 2 + v 2 ) is smaller than their width or height. This definition is optimal when assuming circular-shaped agents. Nonetheless, for non-circular-shaped agents, such as in our case, is considered a bit aggravating as there are angles in which the condition is met but the agents do not physically collide. Therefore, a function C : P ! N such that C(P) counts the rate of collisions between the agents of the population to the possible number of collisions in a single point of time, is used as the CA metric. In our context, C is normalized to [0, 1] by dividing C by the largest hypothetical number of collisions possible in a single step in time.
Based on these two metrics, the objective of the CM with CA takes the form: where O is the space of all possible policies of the agents and z 2 [0, 1] is the weight of the CM score compared to the CA score. The optimization is done on any finite duration in time ½t 0 ; . . . ; t f � � N such that t 0 < t f and F t and C t represent the CM and CA at time t, respectively.

Motivation
The CM behavior of locusts is believed to develop over time throughout an evolution process. The locusts try to adapt to their environment and generate the next population of locusts. During this process, multiple locusts experienced similar situations and operate slightly differently based on unique properties each locust has which resulted in different outcomes. Locusts that perform better were more likely to reproduce and keep their behavior in the population over time. As such, the wanted behavior (e.g., CM with CA) emerged throughout the collective experience of the locust and improved over time using a natural trial-and-error approach. Based on this motivation, we propose a reinforcement learning (RL) based algorithm (A) called Collective Evolution Learning (CEL for short). The proposed algorithm is based on the assumption the agents in a swarm are identical and share common knowledge (policy). Intuitively, the CEL algorithm receives the observable parameters (γ) of a subset of agents (P* � P/{p}) in the local environment (R) of an agent (p) and returns the action (x) that best approximate the optimal action of the CM with CA objective metric (Eq (8)) using a Qlearning based model [50,51]. Since both CM and CA are swarm-level metrics, CEL has an intermediate genetic algorithm (GA) based model [52][53][54][55][56] to optimize CM and CA on the agent's level. Moreover, to obtain a well-performing Q-learning-based model, one is required to well sample the state space of the model [57]. In order to do that, CEL introduces a layer on top of the Q-learning model that measures its performance for different scenarios and uses the K-nearest neighbors (KNN) algorithm [58] to define a new sampling strategy for further training of the Q-learning model. A schematic view of the model training, validation, and testing phases with the interaction between them is presented in

Swarm dynamics using a Q-learner
where k 2 N is the maximum possible number of agents that an agent p 2 P can sense in a single point in time multiplied by the number of vision-based parameters per agent (i.e., c; y; _ c; and _ y), n 2 R 4k is the outcome of the vision-based sensing of an agent p 2 P, and X≔fx i g n i¼1 is the set of all possible actions an agent can make. Following this definition, one can treat A as a policy function getting a state ν for each agent and return the action x it should perform. To learn the policy A, we use a Q-learning-based model. The Q-learner stochastically maps each state ν to the set of possibles actions fx i g n i¼1 by assuming a discretization of both the input and output spaces. Hence, it is assumed that the number of possible actions is finite and that n 2 N 4k is obtained by the discretization of each vision parameter to define a classification problem rather than a regression problem.
The process of obtaining the model, is divided into three main phases: training, evaluation, and testing. The training and evaluation phases are executed sequentially multiple times until a stop condition is satisfied. Afterward, the testing phase is taking place to compute the VCMCA score of the obtained model.
The training phase is further divided into the first time the training phase is utilized and any other time afterward. For the first time, the Q-learning matrix is initialized with all possible states and their actions such that each action has a uniform distribution to be chosen for each state. Afterward, and for each utilization of the training phase, the training phase includes three linear processes. First, a scenario including zero, one or more (limited by k) neighbor agents, and a focal agent is generated, at random and in an iterative manner. Based on this scenario, the Q-learner's state is computed from the focal agent's point of view. Second, an approximation to the optimal actions for this scenario is obtained using a GA-based model aiming to optimize the CMCA for Z 2 N steps in time. This step is described in detail in Section 3.3. Lastly, the results of the algorithm are used to update the Q-learner. The Q-learner is updated by adding to the current value a score obtained by the GA process multiplied by some pre-defined learning rate, λ 2 [0, 1]. This process repeats itself for Z train 2 N times.
The evaluation phase includes three linear processes. First, a set of scenarios of size Z validate 2 N are generated at random. For each scenario, the Q-learner's state is computed from the focal agent's point of view. Pending, the performance of the Q-learner for each state is computed by simulating η steps in time such that both the focal and neighbor agents follow the Q-learner's model. Afterward, CEL uniformly samples the state space and computes the estimated performance of the model on these values using the distance-weighted KNN algorithm with y 2 N neighbor data points. The outcome of the last step is a 4k-dimensional distribution function. If the total variation distance between the obtained distribution and a uniform distribution is larger than a pre-defined threshold k 2 R þ then CEL gets back to the training phase with the obtained distribution to sample from for the next set of training samples. Otherwise, CEL moves to the testing phase.
The testing phase includes three linear processes. First, CEL generates a set of simulation scenarios of size Z test 2 N, at random. Notably, the simulation scenarios are of the entire population and not of a single focal agent and its neighbor agents as used during the training and evaluation phases. Second, for each simulation scenario, the simulation is performed for 1 << T 2 N steps in time such that all the agents in the population follow the Q-learner model. Lastly, the performance of the CEL model is defined to be the average CMCA score of the simulations.

Agent's decision-making using a genetic algorithm
An approximation to the optimal actions for a given scenario is obtained using a GA-based model aiming to optimize the CMCA score of a single agent over a short duration of time. To use the GA, we first define a local, individual objective function, based on the collective metric Eq (8), as follows: where P* � P is the subset of agents in the local environment of a focal agent p 2 P, and % is a gene used by the GA. This definition assumes a semi-Markovian property [59] to the actionmaking processes of the agents which means that the agents are aiming to perform the optimal action for the next state, taking into consideration the current and previous states. First, initialization of the gene population is taken place. Formally, a gene's population of size t 2 N such that each gene (%) is defined by a vector of size η corresponding to the number of steps in time the focal agent needs to perform is generated. Each value of the vector is an action x 2 X. Afterward, for g 2 N generation (iterations), three operations took place: crossover, mutation, and next generation. To be exact, the cross-over operator COOð% 1 ; % 2 ; wÞ ! ð% 0 1 ; % 0 2 Þ gets two genes % 1 , % 2 and a weight vector w 2 [0, 1] η and returns two new genes ð% 0 is an index of the genes' vector sampled in a distribution defined by w [60]. The mutation operator MO(%) ! % 0 gets a gene % and returns an new gene % 0 such that one of the values of % altered with a different action in a uniformly distributed manner. The next generation operator (NGO) leaves a portion ρ 2 [0, 1] of the genes with the highest fitness score while the remaining genes are chosen by a distribution defined by the L 1 normalization of the fitness scores of the remaining genes [61]. Once the GA finished its calculations, the gene with the best fitness score is taken. From this gene, the first action is provided for the Q-learner.
Notably, one does not have to use the GA approach as in theory the Q-learning algorithm should coverage to a similar policy without the GA component. However, as the number of samples that results in better fitness is really small compared to the action space's size, a Qlearner would require a very large number of samples to achieve similar results. As such, it would cause a much longer training process.
A time complexity analysis of CEL is provided in the Appendix.

Results
To evaluate the ability of the CEL algorithm to obtain a VCMCA model and explore the influence of several properties of the agents on the model's performance, we conduct three experiments. First, measure the performance of the proposed algorithm compared to two other approaches-vision-based CM without CA and random walk. Second, we analyze the influence of population density, agent's height to width ratio, and sensing radius on the VCMCA model's performance. Finally, we approximate the VCMCA model obtained by the CEL algorithm using an explainable Random Forest [62] model to extract the rule-base structure for the agent's action-making process, allowing the user to interpret the connection between the visual stimuli and the reaction of the agent to the visual stimuli.

Setup
We implemented the VCMCA model and CEL algorithm as a simulator for the case of locusts. We chose locusts since, at the beginning of the flocking, they move on the ground, which is approximated by a 2D plan. However, the model can be adapted to other species as well. The simulation is divided into two phases: 1) agent's individual VCMCA model training using the CEL algorithm; and 2) usage of the obtained VCMCA model in the locust swarm simulation. The default hyperparameters' values used in the simulation are picked to represent a population of locusts that become locusts. The parameters with their descriptions and values are summarized in Table 1. If not stated otherwise, these hyperparameters' values are used across all experiments. In particular, the agent's height to width ratio is set to be 7/1. This ratio is obtained by rounding the closest natural number to the average ratio of 50 locusts. To be exact, we manually tag the bounding box of 50 upper-view locust images, obtaining an average height-to-width ratio of 6.78 with 1.48 standard deviation. Specifically, the locust images were of the Schistocerca cancellata and Schistocerca gregaria species since these are known to exhibit swarming behaviors and density-dependent phase transition [63,64]. In addition, group population sizes were set with accordance to available biological records of locust swarms [64,65]. Though the exact density values are one order of magnitude larger than the simulations, we concentrate on the emerging phases of the flock when the density is yet to reach its maximal numbers. The model uses abstract discrete time steps. In order to calibrate the simulation's abstract time step, we define each time step to be the average duration required by the average locust (agent) to make a decision and move [66].

CEL algorithm training
We train the locusts' VCMCA model using the CEL algorithm for n = 30 times to evaluate the average relationship between the number of training samples and the obtained VCMCA model's performance due to the stochastic nature of the algorithm. In Fig 4 the blue line indicates the training fitness' function value (i.e., Eq (9)), and the green line indicates the CMCA score (i.e., Eq (8)). The results are shown as mean ± standard deviation. The vertical dashed-yellow lines indicate the validation phase during the training process as described in Fig 3. We fitted the average training fitness and CMCA score using the least mean square method, obtaining F 1 : fitness = 0.189ln(samples) + 0.053 and F 2 : cmca = 0.196ln(samples) − 0.268 with coefficient of determination R 2 = 0.989 and R 2 = 0.972, respectively. Hence, the functions F 1 and F 2 are well-fitting the dynamics. Since both functions are monotonically increasing, it shows that there is a connection between the CMCA score of an individual and the population's CMCA score. Similarly, the Pearson correlation between the training fitness and CMCA scores in the validation stages is R 2 = 0.971 with a p-value of 0.0057. Thus, the scores are strongly linearly correlated as well.
Screenshots of the simulator at the beginning and end of the simulation of an average performance VCMCA model after 50 million training samples are shown in Fig 5a and 5b, respectively. Each agent is represented by a black rectangle with a surrounding gray circle that represents the sensing radius of the agent and its index number. The agents marked in dashed blue indicate that they have collided. One can see that in Fig

Baseline
We used the obtained VCMCA model to evaluate the CM, CA, and CMCA scores over time. We define the scores as the complement values to the objective functions of CM (Eq 6), CA (Eq 7), and CMCA (Eq 8). Thus, higher values of CM or CA scores signify better flocking performance or lower collision rates, respectively. We compared these results to a model obtained in the same way but with z = 1 rather than z = 0.5. Namely, in this configuration, the CEL algorithm aims to optimize the CM while ignoring CA. We refer to this model as CM model. In addition, we compare the proposed model with a random walk. The random walk model assumes a uniform distribution probability for each action and picks one action each time at random.
The results of the CM, CA, and CMCA scores are shown as mean ± standard deviation of n = 30 simulations in Fig 6a-6c, respectively. As expected, both the CM and VCMCA models obtain flocking with a CM score of 0.75 and 0.7 while the random walk converges to 0.4, as presented in Fig 6a. On the other hand, while the CMCA model converges to a CA score of 0.74, the random walk and CM models obtained 0.29 and 0.17, respectively, as shown in Fig 6b. Of note, the CM model obtains the worse CA score out of the three. Since the CM model being trained only to flock with other agents, it disregards the collisions in its training, causing the agent to move closer to each other as shown in [46].
In addition, the mean results of each model over time are shown in Table 2 such that the results are the mean ± standard deviation computed on the values presented in Fig 6a-6c,  respectively. As seen in Fig 6a, the agents' convergence to cohesive movement, using the CM and CMCA models, emerges gradually with the collective motion index increasing with slowing rates over time. This process is similar to displays of hopper-bands formations, in which desert locust nymphs gather together into flocks of thousands of individuals. Due to the initially disordered state, it takes time for the visual information to be mediated between the swarm members and for them to change and adapt their heading on an individual level, which later results in the swarm reaching a coordinated heading direction on a group level. Another notable aspect is the collision rate (Fig 6b) for CM and Random models reaching its final value in a relatively short period, in contrast to the CMCA model that decreases moderately over the time of the simulation, reaching a saturation value more than four times larger than the CM model. The gradual decrease in the CA score (that is by definition equal to an increase in the collision rate) in the CMCA model could be attributed to the CM component causing the agents to align and follow each other, in turn bringing them closer together. These results are aligning with the biological inspiration since flock members facilitate coherent movement partly for their collision-avoidance abilities [67,68], preventing them from constantly colliding into each other, which could potentially cause high mortality in flocking animals. The models' mean scores for each evalutaion metric are summarized in Table 2.

Parameter sensitively analysis
To estimate the sensitivity of the population density, agent's height to width ratio, and sensing radius on the VCMCA model's performance. Given the trained VCMCA model from the CEL algorithm, we computed the CMCA score over n = 30 simulations with different population sizes, as presented in Fig 7. The results are shown as mean ± standard deviation. Notably, the results are obtained after the VCMCA model is obtained on the default values (see Table 1) and evaluated on the other configurations. The results indicate an optimal flock size (P = 40), suggesting a strong link between the population's size and flocking emergence. For small population sizes, the density is insufficient for phase transformation to occur since interactions between the moving agents are scarce and happen across large time intervals, thus preventing the 'directional cluster' aggregation. In other words, agents' behavior is similar to randomwalk since local interactions do not translate to the group level. An increase in population size allows for the accumulation of aligning inter-agent interactions, in which a small cluster of agents' incoherent movement gradually aligns the rest of the swarm resulting in a large-scale coordinated motion reminiscent of natural flocks. Further increasing the population size leads to an inherent raise in the collision rate or CA maneuvers, where agents are steering away from an aligned direction to avoid an imminent collision with an approaching neighbor. Since our model strives to minimize the collision rate, relatively large populations hinder the aligning mechanism since the CA mechanism dominates the CM mechanism. Moreover, as the size of the environment was kept constant in all the experiments, there is a linear correlation between the population's size and density: σ � |P|/(W � H), where σ is the population's density. Hence, one can obtain the same connection between the population's (swarm) density and the flocking emergence.
In a similar manner, we computed the influence of the agents' height to width ratio (e.g. h/ s) on the CMCA score over n = 30 simulations, as shown in Fig 8. Expectedly, larger height-to-width ratios yield lower CMCA scores. As mentioned in Section 2, elongated agent morphologies, as opposed to circular, complicate the processing of visual stimuli. Neighbor's orientations greatly affect the subtended angle ψ, especially in longer agents, since the same agent with the same distance to its center could have highly differing values of ψ. For example, observing an agent's head or bottom vs. observing its side will greatly change the input information and introduce significant noise into the state domain. Moreover, Fig 8 shows that square-shaped agents got the highest CMCA score. One can conclude that a square, with its similarity to a disc, can artificially improve the convergence of a vision-based model [44,45], since it eliminates the problem of orientational noise with its one-to-one function of distance to the angular area. However, such agent morphologies are biologically inaccurate, especially when addressing flocking animals as locusts, starlings, fish, etc [69][70][71].
In a similar manner, we computed the influence of the agents' sensing radius (r) on the CMCA score over n = 30 simulations, as shown in Fig 9, such that the x-axis indicates both the sensing radius in absolute size and as the multiplier of the agent's height.

Explainability of the model
The proposed model is the outcome of multiple stochastic processes, converging into collective and intelligent behavior. In this section, we aimed to measure the ability of a simple model such as Decision Tree (DT) or even its more complex but still explainable extension, the Random Forest (RF) model in capturing the complexity of the proposed model. First, we generate 1000000 cases at random, in the same way, we have done for the testing phase of the CEL algorithm (see Section 3) and query the model, resulting in a corresponding list of actions for each state. Based on this data, we train RF models with 1000 trees to allow a large number of parameters, allowing them to fit complex dynamics [62]. The obtained RF models' accuracy relative to the VCMCA model with a restriction of the number of leaf nodes and the tree's maximum depth is shown in Fig 10a and 10b, respectively.
Unsurprisingly, as more leaves are available as well as more depth to the DTs, the better the RF model approximates the VCMCA model. However, after 320 leaf nodes (or max-depth of 8) there is no significant improvement in the model's performance over 93.37%, indicating that 6.63% of the cases are anomalies. Moreover, there is a sharp improvement between 70 leaf nodes and 80. This sharp improvement is associated with the RF model's to differ between keeping the current vector of movement or rotating 15 degrees (e.g., the smallest turn possible in the default configuration of the model) to either the right or the left. These three actions seem to be very common and the correct separation between them causes a large performance improvement. A slight improvement can be obtained by using advance post-pruning methods such as the SAT-PP algorithm [72] but these are not significantly changing the overall observed behavior.

Discussion
In this work, we propose a vision-based collective motion with collision avoidance (VCMCA) model in a two-dimensional space. In addition, we propose a collective evolution learning (CEL) algorithm that simulates the evolution process of an individual in the context of the group (population) aiming to optimize a local VCMCA task. Individuals better perform in the task at each generation contributing to the collective knowledge, which emerges the desired multi-objective collective behavior.
We used the VCMCA model obtained from the CEL algorithm for the case of locust, showing that multi-objective swarm behavior, such as CM with CA, emerges from the optimization of the individual-level objective. These results are presented in Fig 4 and Table 2. Moreover, Figs 7-9 show that the results are robust over the population's density, agents' shape, and sensing radius, respectively. In particular, disc-like agents with vision-based input are obtaining better results than agents with more complex topology, reproducing [44]'s results.
In addition, by approximating the obtained VCMCA model using explainable machine-learning-based models we show that small rule-based models are not able to capture the complexity of the VCMCA model and it is requiring several hundreds of rule combinations to capture only 93.3% of the VCMCA model's ability. This model size is commonly considered out of the ability of the average person or even domain expert to explain [73]. This outcome suggests that the proposed VCMCA model obtained by the CEL is not explainable even when trying to approximate with the RF (or the DT) model, agreeing with similar results reported by [74]. Thus, while one can point to the "reason" the RF model makes each decision, this outcome will be in the form of a majority vote between multiple boolean satisfiability problems (on the DT level), making it impractical to analyze or to extract generic roles. Nevertheless, while the proposed model is not explainable on the single decision level, the overall behavior agrees with known biological behaviors [48].
These results agree with previous outcomes proposed by [46] which shows that agents who learned secondary goals such as keeping contact with neighbors were able to keep CM. These results raise the question that if CM or CA are the primary objective of the individual, as assumed in this work, or other biological objectives such as reducing their risk of being predated or increasing their probability of finding resources are the source of these behaviors. In particular, one can evaluate the performance of CEL for z = 0 and z = 1, concluding the influence of each objective to the overall swarm behavior. In a similar manner, evaluating CEL's performance for a topological-based rather than metric-based approach which is believed to better represent multiple scenarios in nature can shed light on the influence of the flocking mechanism on collective learning. In particular, extending the proposed model to handle three-dimensional geometry would make the model more realistic. This direction can be a promising future work.
Since the proposed VCMCA model extends the model proposed by Vicsek et al. [16], a comparison between the two is expected. Nonetheless, such a comparison is infeasible since comparing Viscak's model as well as other similar models use different types of input data for the agents. Indeed, one would need to modify the input of these models (or algorithms) to match our own vision-based input to make the comparison legitimate. However, by doing so, the examined model would have to be altered as well to handle the new kind of input. Hence, the altered model can not be assumed to have the same behavior as the original model.
Moreover, in this work, we tackle a swarm task that assumes a group of identical agents. However, in nature, agents in the population differ by their size, capability, and even action-making process. This diversity may result in more behaviors among the agents, such as picking a leader and introducing new and complex challenges for the model.
The proposed question and many others can be addressed by introducing modifications to the proposed VCMCA model and CEL algorithm proposed in this work. As such, we believe that the proposed approach can be used as a computational background to investigate a more biologically accurate representation of swarm dynamics.

The collective evolution learning algorithm time complexity
Analyzing the time and memory complexity of the proposed Collective Evolution Learning (CEL) is challenging since it uses multiple levels of stochastic processes that depended on each other. First, CEL uses the Q-learning model which converges from uniform distribution to the learned distribution up to e n where n is the number of states in the Q-learner's table [75]. Moreover, for each evaluation of the action's score for the Q-learner, CEL uses the Genetic Algorithm (GA) [76]. In order to evaluate the time complexity of the GA, it can be reduced to a "bitone" task in which the GA algorithm starts with a random population of genes represented by a string of bits and aims to make the entire population identical to a pre-defined target string of bits. The target bit string is the optimal FS represented in the same manner as the gene. As such, a binary string can be of length min k (|S| < 2 k ). While the target bit string is unavailable to us, we assume that the fitness function implicitly defines it. Thus, the GA is lower bounded by an exponent of the number of generations, g, and the genes' population size, τ. Thus, CEL's time complexity is O (e gτn ).

Model parameters summary
A summary of the model parameters is provided in Table 3.

Vision mechanism special cases
Following the definition of the vision mechanism described in Section 2.1, one can define three special cases for the computation of Eq (2) following the values obtained from Eq (3). These three cases are presented in Fig 11.  ≔ (l, v, a, h, w,